XML Data Transformation and Integration — A Schema Transformation Approach
نویسندگان
چکیده
The process of transforming and integrating XML data involves resolving the syntactic, semantic and schematic heterogeneities that the data sources present. Moreover, there are a number of different application settings in which such a process could take place, such as centralised or peer-to-peer settings, each of which needs to be considered separately. In this thesis, we investigate the problem of data transformation and integration for XML data sources. This data format presents a number of challenges that require XML-specific solutions: a schema is not required for an XML data source, and if one exists, it may be expressed in a number of different XML schema types; also, resolving schematic heterogeneity is not straightforward due to the hierarchical nature of XML data. We propose a modular approach, based on schema transformations, that handles the distinct problems of syntactic, semantic and schematic heterogeneity of XML data. We handle the problem of syntactic heterogeneity of XML schema types by introducing a new, automatically derivable schema type for XML data sources, designed specifically for the purposes of XML data transformation and integration. We show how semantic heterogeneity can be handled in our approach using existing methods, and we also propose a new semi-automatic method for resolving semantic heterogeneity using mappings to ontologies as a ‘semantic bridge’. We then present a new schema restructuring method that handles schematic heterogeneity automatically, assuming that semantic heterogeneity issues have been resolved. The contribution of this thesis is the investigation of the problem of XML data
منابع مشابه
Information Sharing for the Semantic Web -a Schema Transformation Approach
This paper proposes a framework for transforming and integrating heterogeneous XML data sources, making use of known correspondences from them to ontologies expressed in the form of RDFS schemas. Our algorithms generate schema transformation/integration rules which are implemented in the AutoMed heterogeneous data integration system. The paper first illustrates how correspondences to a single o...
متن کاملUsing AutoMed for XML data transformation and integration
This paper describes how the AutoMed data integration system is being extended to support the integration of heterogeneous XML documents. So far, the contributions of this research have been the development of two algorithms. One restructures the schema describing an XML document into another schema, and the other materialises an integrated schema resulting from the transformation of several so...
متن کاملTowards the Preservation of Keys in XML Data Transformation for Integration
Transformation of a source schema with its conforming data to a target schema with its conforming data is an important activity in XML as two schemas in XML can represent same real world information. Specifically in XML data integration, transformation of a source to a target is regarded as an important task. An XML source schema can often be defined with XML key which is an important integrity...
متن کاملSchema Translations by Xslt for Gml-encoded Geospatial Data in Heterogeneous Web-service Environment
The paper discusses online integration of XML-encoded datasets in the current Web services environment, especially concentrating on the required schema transformations. The approach is based on the use of a generic XML transformation technology, called Extensible Stylesheet Language Transformations (XSLT). The role of the data integration process in a layered service architecture framework is d...
متن کاملAn Approach for Generating an XML Data Warehouse Schema using Model Transformation Language
Traditionally, the multidimensional schema of the data warehouse is derived from data sources that are mainly the company’s internal data, well-known and structured, by identifying facts, dimensions and numeric measurements through a manual analysis of the operational schemas. With the proliferation of new platforms of communication in today’s information societies, there has been growing numbe...
متن کامل